dumpling: improve string key handling, streaming process with chunking by takaidohigasi · Pull Request #62172 · pingcap/tidb

takaidohigasi · 2025-07-03T02:37:30Z

Latest state (2026-04-24)

Updates since initial submission:

Merged master (1a87518). Resolved dumpling/export/dump_test.go import conflict (sync + sync/atomic) and semantic merge in writer_serial_test.go (switched PR's new tests from the removed storage.NewBufferWriter() to master's local NewBufferWriter() / BytesWriter).
Added a large-scale round-trip CI test (919858e): dumpling/tests/composite_string_key_large/ — 500-row events table with shared PK prefix and 300-row translations table with a 3-column composite key and unicode. Runs dumpling at --rows 50, asserts chunks ≥ 2 and every chunk is a self-contained INSERT ending with ; (regression guard for the isLastChunk discussion), then round-trips via tidb-lightning + sync_diff_inspector for MySQL-source vs TiDB-target checksum equality.
Removed LLM-noise helpers and flattened the chunking API (629efbb), in response to @lance6716's review. −700 net lines. Details in issuecomment-4311143074. Highlights: dropped interpolateStringBoundary, buildStringWhereClauses, Config.IsStringChunking, the singular backward-compat concurrentDumpStringField, the redundant small-table guard, and the duplicate SHOW INDEX query. Downgraded per-iteration Info logs to Debug.

Manual verification against the latest head: dumping the same 500-row composite-string-PK dataset with the master binary and this branch's binary produces 1 chunk vs 10 chunks respectively, with byte-identical row content after normalization (sort + strip trailing ,/;).

What problem does this PR solve?

Issue Number: ref #61999

Problem Summary:

dumpling is processed in parallel only for integer PK ref1 ref2
- takes long time to complete
- string pk is handled sequentially, regarded as 1 chunk, resulted in un-predictable estimation of remaining process time.

What changed and how does it work?

Implemented to handle primary keys starting with a string streamingly. writing might be in parallel processing
- chunking and writing in parallel
- chunking (sequential process)
  - cursor based chunking: use composit key condition(also available for MySQL 5.7 or lower or version which we can't use ROW_NUMBER() function)
note: I don't change the existing integer based implementation.

simplified diagram

  Timeline: 0----5----10---15---20---25---30---35---40---45---50

  * BEFORE (Sequential Approach):
 
  Chunking: [------C1------]
  Writing:                  [================W1================]
  Total Time: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
  * AFTER (Pipeline/Parallel Approach):

  Chunking: [C1][C2][C3][C4]
  Writing:      [====W1====]
                    [====W2====]
                        [====W3====]
                            [====W4====]
  Total Time: ~~~~~~~~~~~~~~~~~~~~~~~~~~~

chunking is sequential but writing start just after each chunking completed, and run in parallel.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

step
1. create records for composit key
2. confirm the number of record
3. execute dumpling with old and new version and compare the number of records and processing time.

v8.5.2

[2025/07/04 07:13:09.746 +09:00] [INFO] [status.go:37] [progress] [tables="40/41 (97.6%)"] ["finished rows"=21907452704] ["estimate total rows"=16365664263] ["finished size"=1.661TB] ["average speed(MiB/s)"=53.78349999284309] ["recent speed bps"=56396034.34047334] ["chunks progress"="99.97 %"]
[2025/07/04 07:13:49.491 +09:00] [INFO] [collector.go:264] ["backup success summary"] [total-ranges=3005] [ranges-succeed=3005] [ranges-failed=0] [total-take=6h32m39.744718607s] [total-kv-size=1.663TB] [average-speed=70.6MB/s] [total-rows=21943318733]

this PR version

[2025/07/04 12:13:58.240 +09:00] [INFO] [status.go:37] [progress] [tables="33/41 (80.5%)"] ["finished rows"=21474305967] ["estimate total rows"=16365664263] ["finished size"=1.644TB] ["average speed(MiB/s)"=387.8366052577693] ["recent speed bps"=406675792.1587332] ["chunks progress"="98.02 %"]
[2025/07/04 12:15:20.846 +09:00] [INFO] [collector.go:264] ["backup success summary"] [total-ranges=21071] [ranges-succeed=21071] [ranges-failed=0] [total-take=4h49m22.606312684s] [total-kv-size=1.663TB] [average-speed=95.79MB/s] [total-rows=21943318733]

  => number of records(tota-rows) is the same, less total-take.

I tested for MySQL 5.7.34-log

I understand version matrix is covered by
https://github.com/pingcap/tidb/blob/master/.github/workflows/integration-test-dumpling.yml#L51-L56

No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility
no, I don't change dumpling command interface.

Documentation

Affects user behaviors

https://docs.pingcap.com/tidb/stable/dumpling-overview/#export-to-sql-files

The -r option enables the in-table concurrency to speed up the export. The default value is 0, which means disabled. A value greater than 0 means it is enabled, and the value is of INT type. When the source database is TiDB, a -r value greater than 0 indicates that the TiDB region information is used for splitting, and reduces the memory usage. The specific -r value does not affect the split algorithm. When the source database is MySQL and the primary key or the first column of the composite primary key is of the INT type, specifying -r can also enable the in-table concurrency.

=> fixed in
pingcap/docs#21324

Contains syntax changes
no
Contains variable changes
no
Contains experimental features
no
Changes MySQL compatibility
no

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

improve dumpling string key handling, streaming process with chunking is available

Summary by CodeRabbit

New Features
- Streaming export for composite string primary-key tables (string-key chunking)
- Best-effort real-time progress percentage for streaming chunked exports
Bug Fixes
- Improved chunk boundary generation and stable row ordering for composite-key dumps
- INSERT output now always emits well-formed, terminated statements
Tests
- Expanded unit/integration tests for string-key chunking, SQL escaping, WHERE/cursor generation, progress, and writer chunking
Documentation
- Added integration test README and runnable test suites with fixtures and validation scripts

ti-chi-bot · 2025-07-03T02:37:39Z

Welcome @takaidohigasi!

It looks like this is your first PR to pingcap/tidb 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to pingcap/tidb. 😃

ti-chi-bot · 2025-07-03T02:37:41Z

Hi @takaidohigasi. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tiprow · 2025-07-03T02:37:46Z

Hi @takaidohigasi. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

lance6716 · 2025-07-03T04:40:19Z

This is a good idea !

/cc @OliverS929

codecov · 2025-07-03T08:28:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.5150%. Comparing base (a62a6d1) to head (0610dc4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #62172        +/-   ##
================================================
- Coverage   77.2798%   75.5150%   -1.7649%     
================================================
  Files          2010       2008         -2     
  Lines        555326     574144     +18818     
================================================
+ Hits         429155     433565      +4410     
- Misses       125251     139687     +14436     
+ Partials        920        892        -28

Flag	Coverage Δ
integration	`41.5177% <ø> (+1.7247%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`60.4453% <ø> (-0.0226%)`	⬇️
parser	`∅ <ø> (∅)`
br	`49.9534% <ø> (-13.0557%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

An earlier replace_all deletion of `conf.IsStringChunking = false` left the following line over-indented by one tab, which gofmt (and hence nogo) rejects. Apply `gofmt -w`; no functional change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Critical fix (flagged by CodeRabbit): The producer defer and the last writer callback can both observe `finalized && finished == sent` and each independently called IncCounter + Delete on chunkedTables. The result was a double-count of finishedTablesCounter for every chunked table. Replace the two-step "IncCounter + Delete" at all 4 call sites (concurrentDumpTable, concurrentDumpTiDBPartitionTables, streamStringChunks, and the writer finishTask callback) with a single LoadAndDelete: whichever side wins the race gets loaded==true and performs the increment; the loser silently no-ops. Invariant doc on tableChunkStat is rewritten to describe the actual handshake rather than the incorrect ordering argument the previous comment implied. Minor cleanups in the same bundle (also from CodeRabbit): - dumpTableData now uses "*" for estimateCount on string-leading composite keys, mirroring concurrentDumpTable's trick so estimateTotalRowsCounter doesn't lag when only the varchar column is EXPLAIN'd. Also surfaces the (previously discarded) error from pickupPossibleField at Debug level. - status.go streaming-progress branch clamps progress to <=100%; the two atomic loads of totalChunks and completedChunks can otherwise briefly read an out-of-order pair. Also rewrites the stray "Debug:" comment to explain the intent. - dump_test.go busy-wait on stats.finished replaced with require.Eventually so a stuck goroutine fails the test rather than hanging it to the suite timeout. Not addressed in this commit: CodeRabbit's note on ir_impl_test.go:174's unused sqlmock — that test deserves a focused review rather than a rubber-stamp fix, kept for a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses the remaining "known-open" items from the PR review: 1. Remove TestTableDataIRWithCompositeKeys (flagged by CodeRabbit): the test registered an sqlmock expectation that was never consumed, then asserted that a query string contained substrings the test itself had put into it. No behaviour was exercised. 2. Inline buildLowerBoundWhereClause: it was a one-line wrapper that delegated to buildCursorWhereClause, and buildCursorWhereClause already returns exactly the "WHERE >= lower_boundary" semantics the single caller needs. Delete the wrapper, point the caller at buildCursorWhereClause directly, and drop the companion test TestLowerBoundQueryLimitClause which duplicates the coverage already in TestBuildCursorWhereClause. Net: -65 lines, no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

takaidohigasi · 2026-04-24T11:59:31Z

once all the CI, and will cope with coderabbit AI reviews.

Five remaining CodeRabbit comments, all in the two integration-test shell scripts: composite_string_key_large/run.sh: - Generate the events / translations INSERT blocks via python into a tempfile and load with run_sql_file, matching the harness' convention. The previous `python3 … | mysql …` pipe hardcoded 127.0.0.1:3306 and duplicated $DUMPLING_TEST_USER resolution, breaking any future harness-level auth / socket / TLS change. - Replace `for chunk in $(find …)` with `find … -print0 | while read -d ''` (shellcheck SC2044). - Tighten the "ends with ';'" check: `tail -n 1 | case *";)"` instead of `tail -c 3 | grep ';'` so `);` followed by extra bytes is no longer accepted. - Move `export DUMPLING_TEST_DATABASE` up so it's set before any data load (required by run_sql_file). composite_string_key/run.sh: - Replace `ls -1 … | wc -l` with `find … | wc -l` (SC2012) and quote table_name in the glob (SC2086). - Replace the chunk-iteration for-loop glob with `find -print0 | while read -d ''` (SC2231 / SC2044). - Drop `diff -B -w`: the committed fixtures are checked in specifically to lock down exact chunk output, so whitespace- tolerant diff defeats the purpose. Verified locally that the regenerated fixtures (commit 0e2468c) end with `;\n` and match dumpling output byte-for-byte under plain diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dumpling/export/dump.go (1)
960-975: ⚠️ Potential issue | 🟡 Minor

sent.Add(1) runs even when the context is already done.

sent is incremented unconditionally before the select, so a ctx‑done path returns true with sent already bumped but the task never actually enqueued. When the producer's defer then fires, finished != sent forever and the entry is not cleaned up; the Dumper is about to exit, so this is bounded by its lifetime, but it also prevents the terminal IncCounter for an in‑flight table on cancellation — which matches the "did not finish" semantics, but it's worth confirming that is the intended behavior.

A safer ordering is to only bump sent on successful hand‑off:
 func (d *Dumper) sendTaskToChan(tctx *tcontext.Context, task Task, taskChan chan<- Task) (ctxDone bool) {
-	if td, ok := task.(*TaskTableData); ok {
-		if val, ok := d.chunkedTables.Load(td.Meta.ChunkKey()); ok {
-			val.(*tableChunkStat).sent.Add(1)
-		}
-	}
 	select {
 	case <-tctx.Done():
 		return true
 	case taskChan <- task:
+		if td, ok := task.(*TaskTableData); ok {
+			if val, ok := d.chunkedTables.Load(td.Meta.ChunkKey()); ok {
+				val.(*tableChunkStat).sent.Add(1)
+			}
+		}
 		tctx.L().Debug("send task to writer",
 			zap.String("task", task.Brief()))
 		DecGauge(d.metrics.taskChannelCapacity)
 		return false
 	}
 }
Note the small window where a writer could process and increment finished before the producer's sent.Add lands is still tolerated by the handshake: finalized can only be set after all sendTaskToChan calls have returned, at which point any pending sent.Add has completed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump.go` around lines 960 - 975, The current implementation
in Dumper.sendTaskToChan increments tableChunkStat.sent unconditionally when
task is a *TaskTableData, which can happen even if tctx is already done and the
task never enqueued; move the sent.Add(1) so it only runs on successful handoff:
remove the pre-select val.(*tableChunkStat).sent.Add(1) and instead, inside the
select case that performs taskChan <- task (i.e., after the send succeeds), call
val.(*tableChunkStat).sent.Add(1) before returning; keep the rest of the logic
(logging, DecGauge, return value) intact to preserve the handshake semantics
with TaskTableData, chunkedTables and tableChunkStat.

🧹 Nitpick comments (9)

dumpling/export/writer_serial_test.go (2)
449-494: TestWriteInsertWithStatementSizeLimit and TestWriteInsertMultipleStatements are largely duplicative.

Both construct a mock table, set a small StatementSize, and assert (a) Count("INSERT INTO ... VALUES") > 1 and (b) each split chunk ends with ;. The only meaningful difference is the dataset/size tuple (6 rows @ 150 bytes vs 4 rows @ 50 bytes). Consider a single table-driven test to reduce maintenance cost and keep the suite minimal per the test guideline.
♻️ Sketch
func TestWriteInsertStatementSizeChunking(t *testing.T) {
    cases := []struct {
        name          string
        data          [][]driver.Value
        colTypes      []string
        tableName     string
        statementSize uint64
    }{
        {"users_6rows_150", /*...*/, 150},
        {"items_4rows_50",  /*...*/, 50},
    }
    for _, tc := range cases {
        t.Run(tc.name, func(t *testing.T) { /* shared assertions */ })
    }
}
As per coding guidelines: "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required."

Also applies to: 535-571
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/writer_serial_test.go` around lines 449 - 494, Two tests
(TestWriteInsertWithStatementSizeLimit and TestWriteInsertMultipleStatements)
duplicate logic by creating mock tableIR data, setting a small statementSize,
calling WriteInsert and asserting multiple INSERT chunks and semicolon
termination; consolidate them into a single table-driven test (e.g.,
TestWriteInsertStatementSizeChunking) that iterates cases with different
data/statementSize tuples, reuses the same setup (createMockConfig,
newMockTableIR, NewBufferWriter, configForWriteSQL, WriteInsert, and metrics)
and performs the shared assertions (count of "INSERT INTO `...` VALUES", each
chunk ends with ';', and row count) inside t.Run subtests to remove duplication
and keep tests deterministic.
449-494: Harden the row-presence assertion.

Line 493 counts ( characters as a proxy for row count. It happens to work because none of the test values contain (, but it will silently misreport if test data ever includes a parenthesis, and it does not actually prove that every input row made it to the output across chunk boundaries. Prefer using the returned n (already verified) plus a per-chunk row count, e.g. splitting on the INSERT prefix, trimming the ;\n, and summing the number of ),\n/);\n row terminators. At minimum, count ),\n + );\n instead of (.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/writer_serial_test.go` around lines 449 - 494, The current
row-presence assertion in TestWriteInsertWithStatementSizeLimit is fragile
because it counts "(" characters; instead, verify rows by parsing the generated
INSERT chunks: split output on the INSERT prefix ("INSERT INTO `users` VALUES"),
for each non-empty chunk count row terminators by summing occurrences of "),\n"
and ");\n" (or trimming the trailing ";\n" and counting occurrences of "),\n"
plus one per non-empty chunk if needed), and assert that this total equals the
returned n (or 6); update the assertion that currently uses
strings.Count(output, "(") to use this more robust per-chunk terminator counting
so WriteInsert and TestWriteInsertWithStatementSizeLimit are resilient to
parentheses in data and chunk boundaries.
dumpling/export/sql_util.go (2)
124-129: Stale godoc fragment above escapeSQLString.

The // pickupPossibleFieldsForStringChunking returns all columns… line documents a different (removed?) function but is physically attached to escapeSQLString, which confuses go doc output and readers. Drop that line (or move it if the function still exists elsewhere).
♻️ Proposed cleanup
-// pickupPossibleFieldsForStringChunking returns all columns of the selected index for composite key chunking
-// escapeSQLString properly escapes a string for use in internal SQL boundary condition queries.
+// escapeSQLString properly escapes a string for use in internal SQL boundary condition queries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/sql_util.go` around lines 124 - 129, The godoc above
escapeSQLString contains a stale fragment referencing
pickupPossibleFieldsForStringChunking; remove that stray line (or relocate it to
the correct function if that function still exists) so the comment block
immediately preceding escapeSQLString documents only escapeSQLString; update the
comment text to accurately describe escapeSQLString and ensure no leftover
references to pickupPossibleFieldsForStringChunking remain.
94-116: Unique-key tie-break is non-deterministic.

When multiple unique keys share the same column count, the chosen bestKey depends on Go map iteration order, so the chunking key can differ run-to-run for the same schema. If determinism matters for reproducibility/debugging of chunk boundaries, break ties by keyName (lexicographic) or by the first key seen in SHOW INDEX order.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/sql_util.go` around lines 94 - 116, The current tie-break for
selecting bestKey from uniqueKeyMap is non-deterministic because map iteration
order can vary; update the selection logic in the block that iterates
uniqueKeyMap so ties on keyInfo.count are broken deterministically: collect the
map keys, sort them lexicographically (keyName), then iterate the sorted key
list to pick the key with the smallest count (if equal counts, the
lexicographically smallest keyName wins). Ensure you update references to
uniqueKeyMap, bestKey and keyInfo.count so the final chosen bestKey is
deterministic across runs; alternatively sort by the original SHOW INDEX order
if that ordering is available.
dumpling/export/ir_impl_test.go (1)
149-185: Test name overstates coverage.

TestRowIterWithStringKeyProgress exercises newRowIter / Decode / HasNext / Close against sqlmock rows, which is fine, but there is nothing "string-key" or "progress"-specific about it — the behavior under test is identical for numeric rows, and progress counters are not touched. Consider renaming (e.g. TestRowIterStringKeyDecode) so failures point to the actual surface under test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/ir_impl_test.go` around lines 149 - 185, Rename the test
function to accurately reflect what's being tested: change
TestRowIterWithStringKeyProgress to a name like TestRowIterStringKeyDecode (or
TestRowIterDecode) to indicate it exercises newRowIter, Decode, HasNext and
Close against sqlmock rows; update the test function declaration and any
references to that identifier so failures point to the correct surface under
test.
dumpling/export/dump_test.go (1)
1047-1083: Optional: add compile-time check that mockTableMeta satisfies TableMeta.

The comment claims the mock implements TableMeta, but only a handful of methods are defined and nothing enforces the claim. If the real interface grows a method, this will not break and silently drift. A one-liner var _ TableMeta = (*mockTableMeta)(nil) makes the contract explicit; alternatively drop the comment since tests only use ChunkKey().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump_test.go` around lines 1047 - 1083, Add a compile-time
assertion that mockTableMeta implements the TableMeta interface by adding a var
declaration like var _ TableMeta = (*mockTableMeta)(nil) near the mockTableMeta
type; this ensures the compiler will fail if TableMeta gains new methods and
keeps the test mock in sync with the real interface (reference mockTableMeta and
TableMeta).
dumpling/export/string_chunking_test.go (1)
107-130: Consider asserting the empty-input behavior matches intent.

{"", []string{""}} locks in that extractOrderByColumns("") returns a one-element slice containing "" rather than an empty slice. That is fine as a contract, but callers that do len(cols) > 0 to decide whether an ORDER BY exists will incorrectly see "one column". A short comment next to this case, or a helper that returns nil on empty input, would prevent the mis-use.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking_test.go` around lines 107 - 130, The test
currently asserts that extractOrderByColumns("") returns a one-element slice
[""], which can mislead callers that check len(cols)>0; update
TestExtractOrderByColumns to either expect an empty (nil) slice for the
empty-input case (replace {"", []string{""}} with {"", []string{}} or nil
expectation) and add a one-line comment in the test beside that case clarifying
the contract for empty input, or alternatively change extractOrderByColumns to
return nil on empty input and update the test accordingly; reference the test
TestExtractOrderByColumns and the function extractOrderByColumns when making the
change.
dumpling/export/dump.go (1)
2161-2166: Stale comment in newTaskTableData.

// Chunking mode is already set at table level in concurrentDumpTable

Nothing visible in newTaskTableData or NewTaskTableData consults a "chunking mode" flag, and concurrentDumpTable doesn't set one. Looks like a leftover from an earlier iteration — please either remove it or update it to describe what the wrapper actually adds (the totalChunks.Add(1) bookkeeping).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump.go` around lines 2161 - 2166, The inline comment in
newTaskTableData is stale and misleading: remove or replace "// Chunking mode is
already set at table level in concurrentDumpTable" with a concise comment that
describes what this wrapper actually does (increments d.metrics.totalChunks and
constructs a TaskTableData via NewTaskTableData); ensure no other code relies on
a non-existent "chunking mode" flag in concurrentDumpTable or NewTaskTableData.
dumpling/export/string_chunking.go (1)
201-216: Sampling error downgrades silently; consider surfacing it.

On QuerySQL failure the loop just logs a Warn and breaks, then the post-loop branch (line 257+) appends a "rest of the table" tail chunk. That means the dump itself still succeeds but degrades to a single huge tail chunk — defeating the parallelism the feature exists to provide, without any error signal to the caller. Consider returning the error (or at least an AddCounter to a dedicated failure metric) so operators notice. If the current behavior is intentional (e.g., to stay compatible with best-effort master behavior), a short comment here explaining that contract would help future readers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking.go` around lines 201 - 216, The sampling loop
currently swallows QuerySQL errors (logging a Warn and break) which causes a
silent fallback to a single tail chunk; update the loop in string_chunking.go to
surface sampling failures: on QuerySQL error (the branch that logs via
tctx.L().Warn with db/tbl/i and err) either return the error up to the caller
(propagate the error from the function that runs the loop) or increment a
dedicated failure metric (e.g., tctx.AddCounter or a specific telemetry counter)
before breaking, and add a short comment stating the chosen contract
(best-effort vs. fail-fast); specifically change the error-handling where
currentBoundary and QuerySQL are used so callers/operators are notified instead
of silently degrading parallelism.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dumpling/export/dump_test.go`:
- Around line 951-1045: The tests TestDumperChunkedTablesManagement and
TestStreamingChunkProgressIntegration only exercise sync.Map/atomic counters and
not the dumper's real cleanup path; update them to exercise the actual cleanup
logic in dump.go by driving the producer/consumer callbacks that own
Dumper.chunkedTables (e.g., invoke the finish-task callback and the producer
defer or call the same code path used in streamStringChunks) so the
LoadAndDelete atomic claim and the finalized.Store(true)/finished==sent ordering
are exercised; alternatively narrow the tests to only assert tableChunkStat
helpers (newTableChunkStat, sent/finished/finalized behavior) and remove any
simulated Delete logic so they don’t give false confidence about dump.go’s
cleanup.

In `@dumpling/export/dump.go`:
- Around line 1129-1139: The defer that registers chunk tracking (creating
newTableChunkStat, storing it in d.chunkedTables, and conditionally
IncCounter(d.metrics.finishedTablesCounter) when finalized) is only used in
concurrentDumpTiDBPartitionTables and missing from the sibling producers
concurrentDumpTiDBTables and concurrentDumpTiDBPartitionTablesWithTableSample;
extract that logic into a helper (e.g. beginChunkTracking(meta TableMeta)
returning a finalizer) which creates the tableChunkStat, stores it in
d.chunkedTables, and when called sets finalized and checks finished==sent then
LoadAndDelete the key and IncCounter; call defer d.beginChunkTracking(meta)()
from concurrentDumpTiDBTables and
concurrentDumpTiDBPartitionTablesWithTableSample so chunks produced by those
paths are tracked and the finishedTablesCounter is incremented exactly once per
table.
- Around line 2168-2208: extractOrderByColumns currently returns []string{""}
when given an empty clause which leads to invalid SQL downstream; update
extractOrderByColumns to return nil (or an empty slice) when there are no
columns after trimming the "ORDER BY " prefix so callers like streamStringChunks
can handle/abort cleanly, and add a short comment on the expected exact prefix
format ("ORDER BY " is expected, case‑sensitive, single space) so callers know
to supply the canonical form or pre-normalize before calling; reference function
name extractOrderByColumns and the caller streamStringChunks/buildOrderByClause
in your change.
- Around line 929-938: The regression leaks chunkedTables entries because
concurrentDumpTable (and streamStringChunks via concurrentDumpStringFields)
creates a chunkedTables entry whose sent counter is incremented by
sendTaskToChan but handleSubTask (used in buildConcatTask) consumes those
sub‑tasks without incrementing finished, so defer cleanup never runs; to fix,
either (preferred) update buildConcatTask's handleSubTask to increment the
corresponding chunkStats.finished for each consumed sub‑task (matching the sent
increments) before consuming the task, or thread a "doNotTrack" boolean through
concurrentDumpTable/concurrentDumpStringFields/streamStringChunks so they skip
creating/storing chunkStats in chunkedTables when buildConcatTask is the caller;
ensure the same unique symbols are updated: buildConcatTask, handleSubTask,
concurrentDumpTable, streamStringChunks, concurrentDumpStringFields,
sendTaskToChan, chunkedTables, chunkStats.finished, chunkStats.sent, and
finishedTablesCounter.

In `@dumpling/export/ir_impl_test.go`:
- Around line 132-147: The two tests TestNewTableDataWithImprovedChunking and
TestTableDataQueryValidation are tautological — they only round-trip a
handcrafted query through newTableData and assert substrings of the same input
without exercising chunking or the expectValid=false path; fix by either
removing these redundant tests or converting them to real behavioral tests: call
the chunking/iterator functions that consume the newTableData (e.g., the chunk
generator or iterator used elsewhere in the package) and assert on produced
chunk boundaries/rows and that expectValid is exercised with both true and
false, or delete the tests entirely to avoid dead coverage; update references to
newTableData, TestNewTableDataWithImprovedChunking,
TestTableDataQueryValidation, and the expectValid flag when making the change.

In `@dumpling/export/sql_util.go`:
- Around line 142-260: Both buildCursorWhereClause and
buildUpperBoundWhereClause assume columnNames and boundary/upperBoundary have
equal lengths, which can lead to panics or malformed SQL when they don't; fix by
adding a length guard at the start (e.g., if len(boundary) != len(columnNames)
return "" for buildCursorWhereClause and similarly for
buildUpperBoundWhereClause) or explicitly compute n := min(len(columnNames),
len(boundary)) and use n for sizing quotedCols/escapedBoundary and all loops so
you never index past escapedBoundary and only generate predicates for matching
pairs; update references to quotedCols/escapedBoundary and the loop ranges to
use n consistently to prevent empty RHS fragments or out-of-bounds writes.

In `@dumpling/export/string_chunking.go`:
- Line 1: Add the standard TiDB license header to the top of the new file
string_chunking.go (package export) so it matches other files like dump.go; copy
the header comment block from dumpling/export/dump.go (including the "Copyright
2020 PingCAP, Inc. Licensed under Apache-2.0." line), update the year if
necessary, and place it before the package declaration.

In `@dumpling/tests/composite_string_key_large/run.sh`:
- Around line 57-73: The Python here-doc building the INSERT does not escape
backslashes so the value in variable raw (and payload) loses the backslash when
MySQL interprets the literal; update the generation so backslashes are doubled
before quoting (e.g., escape raw's "\" into "\\") or alternatively emit a SET
SESSION sql_mode='NO_BACKSLASH_ESCAPES' before the INSERT; locate the here-doc
that defines raw/payload/rows in the script (the python block that builds rows
and calls run_sql_file) and either add payload = raw.replace("\\",
"\\\\").replace("'", "''") or emit the SQL mode change prior to the INSERT to
ensure backslashes are preserved.

In `@dumpling/tests/composite_string_key/run.sh`:
- Line 1: The script uses bash-only syntax `read -d ''` but declares a POSIX
shebang; either change the shebang to a bash interpreter (e.g. `#!/usr/bin/env
bash`) so `read -d ''` is valid, or replace the `read -d ''` loop with a
POSIX-compatible pattern (for example: use `find ... -print0 | xargs -0 ...` or
`find ... -print0 | while IFS= read -r -d ''` equivalent avoided by using `xargs
-0` and a sh-compatible command) to iterate safely; update the shebang or the
loop surrounding the `read -d ''` usage accordingly so the script runs under
dash `/bin/sh` as-is or under bash when using bash features.

---

Outside diff comments:
In `@dumpling/export/dump.go`:
- Around line 960-975: The current implementation in Dumper.sendTaskToChan
increments tableChunkStat.sent unconditionally when task is a *TaskTableData,
which can happen even if tctx is already done and the task never enqueued; move
the sent.Add(1) so it only runs on successful handoff: remove the pre-select
val.(*tableChunkStat).sent.Add(1) and instead, inside the select case that
performs taskChan <- task (i.e., after the send succeeds), call
val.(*tableChunkStat).sent.Add(1) before returning; keep the rest of the logic
(logging, DecGauge, return value) intact to preserve the handshake semantics
with TaskTableData, chunkedTables and tableChunkStat.

---

Nitpick comments:
In `@dumpling/export/dump_test.go`:
- Around line 1047-1083: Add a compile-time assertion that mockTableMeta
implements the TableMeta interface by adding a var declaration like var _
TableMeta = (*mockTableMeta)(nil) near the mockTableMeta type; this ensures the
compiler will fail if TableMeta gains new methods and keeps the test mock in
sync with the real interface (reference mockTableMeta and TableMeta).

In `@dumpling/export/dump.go`:
- Around line 2161-2166: The inline comment in newTaskTableData is stale and
misleading: remove or replace "// Chunking mode is already set at table level in
concurrentDumpTable" with a concise comment that describes what this wrapper
actually does (increments d.metrics.totalChunks and constructs a TaskTableData
via NewTaskTableData); ensure no other code relies on a non-existent "chunking
mode" flag in concurrentDumpTable or NewTaskTableData.

In `@dumpling/export/ir_impl_test.go`:
- Around line 149-185: Rename the test function to accurately reflect what's
being tested: change TestRowIterWithStringKeyProgress to a name like
TestRowIterStringKeyDecode (or TestRowIterDecode) to indicate it exercises
newRowIter, Decode, HasNext and Close against sqlmock rows; update the test
function declaration and any references to that identifier so failures point to
the correct surface under test.

In `@dumpling/export/sql_util.go`:
- Around line 124-129: The godoc above escapeSQLString contains a stale fragment
referencing pickupPossibleFieldsForStringChunking; remove that stray line (or
relocate it to the correct function if that function still exists) so the
comment block immediately preceding escapeSQLString documents only
escapeSQLString; update the comment text to accurately describe escapeSQLString
and ensure no leftover references to pickupPossibleFieldsForStringChunking
remain.
- Around line 94-116: The current tie-break for selecting bestKey from
uniqueKeyMap is non-deterministic because map iteration order can vary; update
the selection logic in the block that iterates uniqueKeyMap so ties on
keyInfo.count are broken deterministically: collect the map keys, sort them
lexicographically (keyName), then iterate the sorted key list to pick the key
with the smallest count (if equal counts, the lexicographically smallest keyName
wins). Ensure you update references to uniqueKeyMap, bestKey and keyInfo.count
so the final chosen bestKey is deterministic across runs; alternatively sort by
the original SHOW INDEX order if that ordering is available.

In `@dumpling/export/string_chunking_test.go`:
- Around line 107-130: The test currently asserts that extractOrderByColumns("")
returns a one-element slice [""], which can mislead callers that check
len(cols)>0; update TestExtractOrderByColumns to either expect an empty (nil)
slice for the empty-input case (replace {"", []string{""}} with {"", []string{}}
or nil expectation) and add a one-line comment in the test beside that case
clarifying the contract for empty input, or alternatively change
extractOrderByColumns to return nil on empty input and update the test
accordingly; reference the test TestExtractOrderByColumns and the function
extractOrderByColumns when making the change.

In `@dumpling/export/string_chunking.go`:
- Around line 201-216: The sampling loop currently swallows QuerySQL errors
(logging a Warn and break) which causes a silent fallback to a single tail
chunk; update the loop in string_chunking.go to surface sampling failures: on
QuerySQL error (the branch that logs via tctx.L().Warn with db/tbl/i and err)
either return the error up to the caller (propagate the error from the function
that runs the loop) or increment a dedicated failure metric (e.g.,
tctx.AddCounter or a specific telemetry counter) before breaking, and add a
short comment stating the chosen contract (best-effort vs. fail-fast);
specifically change the error-handling where currentBoundary and QuerySQL are
used so callers/operators are notified instead of silently degrading
parallelism.

In `@dumpling/export/writer_serial_test.go`:
- Around line 449-494: Two tests (TestWriteInsertWithStatementSizeLimit and
TestWriteInsertMultipleStatements) duplicate logic by creating mock tableIR
data, setting a small statementSize, calling WriteInsert and asserting multiple
INSERT chunks and semicolon termination; consolidate them into a single
table-driven test (e.g., TestWriteInsertStatementSizeChunking) that iterates
cases with different data/statementSize tuples, reuses the same setup
(createMockConfig, newMockTableIR, NewBufferWriter, configForWriteSQL,
WriteInsert, and metrics) and performs the shared assertions (count of "INSERT
INTO `...` VALUES", each chunk ends with ';', and row count) inside t.Run
subtests to remove duplication and keep tests deterministic.
- Around line 449-494: The current row-presence assertion in
TestWriteInsertWithStatementSizeLimit is fragile because it counts "("
characters; instead, verify rows by parsing the generated INSERT chunks: split
output on the INSERT prefix ("INSERT INTO `users` VALUES"), for each non-empty
chunk count row terminators by summing occurrences of "),\n" and ");\n" (or
trimming the trailing ";\n" and counting occurrences of "),\n" plus one per
non-empty chunk if needed), and assert that this total equals the returned n (or
6); update the assertion that currently uses strings.Count(output, "(") to use
this more robust per-chunk terminator counting so WriteInsert and
TestWriteInsertWithStatementSizeLimit are resilient to parentheses in data and
chunk boundaries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f0144534-1401-45e0-92e9-715bb7c9602a

📥 Commits

Reviewing files that changed from the base of the PR and between bbbc580 and 4e7015b.

📒 Files selected for processing (10)

dumpling/export/dump.go
dumpling/export/dump_test.go
dumpling/export/ir_impl_test.go
dumpling/export/sql_util.go
dumpling/export/status.go
dumpling/export/string_chunking.go
dumpling/export/string_chunking_test.go
dumpling/export/writer_serial_test.go
dumpling/tests/composite_string_key/run.sh
dumpling/tests/composite_string_key_large/run.sh

…AMPLE paths - buildConcatTask: drop the chunkedTables entry that the inner concurrentDumpTable left behind. handleSubTask intercepts every sub-task without bumping `finished`, so the inner defer's `finished == sent` check is false and the entry is never cleaned up. The follow-up concat task then sees a stale entry, sent gets bumped to N+1, and finishedTablesCounter never fires. Deleting the entry here lets the concat (or fallback) task fall through the no-tracking branch in startWriters and increment the counter exactly once per table. - concurrentDumpTiDBTables / concurrentDumpTiDBPartitionTablesWithTableSample: add chunkedTables tracking via a new beginChunkTracking helper. Previously these producers emitted N TaskTableData tasks per table without populating chunkedTables, so each chunk hit the no-tracking branch and finishedTablesCounter was over-counted by (N-1) per table. - Replace the inline tracking block in concurrentDumpTiDBPartitionTables with the same helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lumns - Remove TestDumperChunkedTablesManagement and TestStreamingChunkProgressIntegration (and the now-unused mockTableMeta) from dump_test.go. They inlined the cleanup condition in the test body instead of exercising the real producer/consumer callbacks, so a regression in the LoadAndDelete atomic claim or the finalized/finished/sent ordering would not be caught. - Remove TestNewTableDataWithImprovedChunking and TestTableDataQueryValidation from ir_impl_test.go. They only round-tripped a constructed query string through newTableData and asserted substrings of the same string; expectValid was never exercised with false. - extractOrderByColumns now returns nil (instead of []string{""}) when the trimmed clause is empty, so callers do not synthesize SELECT lists or WHERE predicates from an empty column name. - Update the corresponding table-driven test row in string_chunking_test.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ping - string_chunking.go: add the standard TiDB Apache-2.0 copyright header (matches the rest of the package). - composite_string_key/run.sh and composite_string_key_large/run.sh: switch shebang to /usr/bin/env bash. Both use `read -r -d ''`, a bash extension that POSIX /bin/sh (e.g. dash) does not implement; under dash the script would silently misparse the loop. - composite_string_key_large/run.sh: also escape backslashes in the Python here-doc payload (replace("\\", "\\\\") before the existing single-quote escape). MySQL's default sql_mode treats backslash as the escape character, so without this the literal backslash is consumed and the test does not actually exercise backslash preservation through the dump/import path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

takaidohigasi · 2026-04-24T23:39:57Z

fixed for the coderabbitai review.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

dumpling/export/dump.go (2)

936-945: Refactor: reuse beginChunkTracking to remove the duplicated handshake.

The newly-introduced beginChunkTracking (lines 1019–1030) implements exactly the Store + finalize defer that this block does inline. Switching to the helper keeps the integer and TiDB‑sample paths in sync if the handshake ever changes.

♻️ Proposed refactor

 	chunkIndex := 0
 	nullValueCondition := ""
-	chunkStats := newTableChunkStat()
-	d.chunkedTables.Store(meta.ChunkKey(), chunkStats)
-	defer func() {
-		chunkStats.finalized.Store(true)
-		if chunkStats.finished.Load() == chunkStats.sent.Load() {
-			if _, loaded := d.chunkedTables.LoadAndDelete(meta.ChunkKey()); loaded {
-				IncCounter(d.metrics.finishedTablesCounter)
-			}
-		}
-	}()
+	defer d.beginChunkTracking(meta)()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump.go` around lines 936 - 945, Replace the inline
chunk-tracking setup in dump.go (the new chunkStats := newTableChunkStat();
d.chunkedTables.Store(meta.ChunkKey(), chunkStats); defer { ... } block) with a
call to the existing helper beginChunkTracking so both paths share the same
Store + finalize defer logic; specifically, remove the manual
creation/storage/finalize defer and invoke beginChunkTracking(meta.ChunkKey(),
chunkStats) (or adapt to beginChunkTracking’s parameter list) so that chunkStats
and the finalize behavior are handled by beginChunkTracking, keeping integer and
TiDB-sample paths consistent.

883-896: Minor: log the fallback COUNT(*) failure at debug.

If QuerySQL fails or returns an invalid NullInt64, count is left at the EXPLAIN value and the function silently drops into the count < conf.Rows sequential branch — exactly the case this fallback was added to avoid. Logging the discarded error makes the silent downgrade diagnosable. As per coding guidelines: Keep error handling actionable and contextual; avoid silently swallowing errors.

📝 Proposed diff

-		err := conn.QuerySQL(tctx, func(rows *sql.Rows) error {
+		qErr := conn.QuerySQL(tctx, func(rows *sql.Rows) error {
 			return rows.Scan(&directCount)
 		}, func() { directCount = sql.NullInt64{} }, countQuery)
-		if err == nil && directCount.Valid && directCount.Int64 > 0 {
+		if qErr == nil && directCount.Valid && directCount.Int64 > 0 {
 			count = uint64(directCount.Int64)
+		} else if qErr != nil {
+			tctx.L().Debug("authoritative COUNT(*) failed; keeping EXPLAIN estimate",
+				zap.String("database", db), zap.String("table", tbl),
+				log.ShortError(qErr))
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump.go` around lines 883 - 896, When the fallback COUNT(*)
query fails or returns an invalid value, add a debug-level log so the downgrade
from EXPLAIN-derived count to sequential scan is observable; specifically, in
the block that calls QuerySQL (referencing QuerySQL, countQuery and directCount)
emit a debug log including the error when err != nil and also log when
directCount.Valid is false (mentioning the countQuery and conf.Rows/context),
then proceed with existing fallback behavior.

dumpling/export/string_chunking_test.go (1)

153-160: Optional: this test is tautological.

TestMaxChunkLimitSafety only re-asserts the literal value of the maxChunkLimit constant. It doesn't exercise the loop guard in streamStringChunks (line 117), so it gives no coverage of the safety behavior — only locks the constant in place. Either drop it, or flip it to assert that streamStringChunks actually breaks out of the boundary-sampling loop and emits a warning when the limit is reached (e.g., via a fake conn that always returns a new boundary).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking_test.go` around lines 153 - 160, The test
TestMaxChunkLimitSafety is tautological because it only asserts the literal
maxChunkLimit constant; update it to exercise the loop guard in
streamStringChunks instead: create a fake connection/mock that simulates an
infinite/new boundary scenario (always returns a new boundary or no progress)
and call streamStringChunks, then assert that the function breaks out once
maxChunkLimit is reached and emits the expected warning/log or closes the
stream; alternatively, if you prefer not to test runtime behavior, remove
TestMaxChunkLimitSafety to avoid locking the constant value in tests.

dumpling/export/string_chunking.go (1)

89-93: Minor: defer ignores sendBufferedChunk(true) error.

If the context is already cancelled when the defer runs, sendBufferedChunk(true) will return tctx.Err() and the buffered final chunk silently disappears. The Dump is shutting down anyway, but the dropped error makes the abort harder to diagnose. Either log it or rely on the surrounding tctx-cancellation path to surface it.

📝 Suggested change

-		// Send any remaining buffered chunk as the last chunk
-		if buffer != nil {
-			_ = sendBufferedChunk(true) // Mark as last chunk
-		}
+		// Send any remaining buffered chunk as the last chunk
+		if buffer != nil {
+			if err := sendBufferedChunk(true); err != nil {
+				tctx.L().Debug("failed to flush final string chunk on shutdown",
+					zap.String("database", db),
+					zap.String("table", tbl),
+					zap.Error(err))
+			}
+		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking.go` around lines 89 - 93, Defer currently
calls sendBufferedChunk(true) and drops its error; change the defer to capture
the returned error and surface it (e.g., log it) unless it's the expected
context-cancellation error from tctx.Err(); specifically, in the deferred
closure around buffer and sendBufferedChunk, assign err :=
sendBufferedChunk(true) and if err != nil && err != tctx.Err() (or simply if err
!= nil) write a diagnostic via the package logger or fmt to record the failure
so the final chunk loss is visible for debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dumpling/export/string_chunking_test.go`:
- Line 1: The new test file string_chunking_test.go is missing the standard TiDB
license header; add the same header used in nearby files (e.g., dump.go and
string_chunking.go) at the very top of string_chunking_test.go — replace the
placeholder year if needed and ensure the header comment line reads like the
other files: “// Copyright <year> PingCAP, Inc. Licensed under Apache-2.0.” so
the package export test file matches project licensing headers.

In `@dumpling/export/string_chunking.go`:
- Around line 161-201: The reset callback passed to conn.QuerySQL is a no-op, so
a retried query can leave the outer-scope slice currentBoundary set from a
previous attempt and leak stale boundaries; change the reset callback (the third
argument to conn.QuerySQL) from func() {} to a function that clears the captured
boundary (e.g., func() { currentBoundary = nil }) so any partial/previous
boundaries are discarded on retry; ensure this change refers to currentBoundary
and the conn.QuerySQL call so the correct closure is cleared.
- Around line 230-244: The logs show the -1 sentinel for unknown TotalChunks
because Brief() in task.go formats TotalChunks directly; update Brief() to check
for negative/unknown totals (e.g., TotalChunks < 0) and render a human-friendly
placeholder like "?" or "unknown" instead of printing -1, so intermediate chunks
created via newTaskTableData(..., totalChunks: -1) produce clear log output;
keep writer.go's finality check (ChunkIndex+1 == TotalChunks) unchanged so
behavior is preserved.

---

Nitpick comments:
In `@dumpling/export/dump.go`:
- Around line 936-945: Replace the inline chunk-tracking setup in dump.go (the
new chunkStats := newTableChunkStat(); d.chunkedTables.Store(meta.ChunkKey(),
chunkStats); defer { ... } block) with a call to the existing helper
beginChunkTracking so both paths share the same Store + finalize defer logic;
specifically, remove the manual creation/storage/finalize defer and invoke
beginChunkTracking(meta.ChunkKey(), chunkStats) (or adapt to
beginChunkTracking’s parameter list) so that chunkStats and the finalize
behavior are handled by beginChunkTracking, keeping integer and TiDB-sample
paths consistent.
- Around line 883-896: When the fallback COUNT(*) query fails or returns an
invalid value, add a debug-level log so the downgrade from EXPLAIN-derived count
to sequential scan is observable; specifically, in the block that calls QuerySQL
(referencing QuerySQL, countQuery and directCount) emit a debug log including
the error when err != nil and also log when directCount.Valid is false
(mentioning the countQuery and conf.Rows/context), then proceed with existing
fallback behavior.

In `@dumpling/export/string_chunking_test.go`:
- Around line 153-160: The test TestMaxChunkLimitSafety is tautological because
it only asserts the literal maxChunkLimit constant; update it to exercise the
loop guard in streamStringChunks instead: create a fake connection/mock that
simulates an infinite/new boundary scenario (always returns a new boundary or no
progress) and call streamStringChunks, then assert that the function breaks out
once maxChunkLimit is reached and emits the expected warning/log or closes the
stream; alternatively, if you prefer not to test runtime behavior, remove
TestMaxChunkLimitSafety to avoid locking the constant value in tests.

In `@dumpling/export/string_chunking.go`:
- Around line 89-93: Defer currently calls sendBufferedChunk(true) and drops its
error; change the defer to capture the returned error and surface it (e.g., log
it) unless it's the expected context-cancellation error from tctx.Err();
specifically, in the deferred closure around buffer and sendBufferedChunk,
assign err := sendBufferedChunk(true) and if err != nil && err != tctx.Err() (or
simply if err != nil) write a diagnostic via the package logger or fmt to record
the failure so the final chunk loss is visible for debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c163318d-a38a-4640-add4-eb73a7a838d9

📥 Commits

Reviewing files that changed from the base of the PR and between 4e7015b and 5068102.

📒 Files selected for processing (7)

dumpling/export/dump.go
dumpling/export/dump_test.go
dumpling/export/ir_impl_test.go
dumpling/export/string_chunking.go
dumpling/export/string_chunking_test.go
dumpling/tests/composite_string_key/run.sh
dumpling/tests/composite_string_key_large/run.sh

🚧 Files skipped from review as they are similar to previous changes (4)

dumpling/export/ir_impl_test.go
dumpling/export/dump_test.go
dumpling/tests/composite_string_key_large/run.sh
dumpling/tests/composite_string_key/run.sh

- string_chunking_test.go: add the standard TiDB Apache-2.0 license header (matches the rest of the package). - string_chunking.go: clear currentBoundary in the QuerySQL reset callback so a retried boundary-sample query cannot inherit a partial row from a previous attempt. Without this, the loop's `len(currentBoundary) == 0` check after the call could see stale data even when the retried query returned no rows. - task.go: render TaskTableData.Brief() as `(idx/?)` when TotalChunks is the streaming sentinel (< 0). The string-chunking buffering path emits intermediate chunks with TotalChunks=-1; only the final chunk is patched with the real total. The writer's `ChunkIndex+1 == TotalChunks` finality check is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

dumpling/export/string_chunking_test.go (1)

155-162: Redundant bounds assertions after exact equality.

Once require.Equal(t, int64(1000000), int64(maxChunkLimit), ...) passes, the two follow-up Greater/Less checks cannot fail and add no coverage. Consider dropping them, or replacing the exact-equality assertion with just the bounds if the intent is to tolerate future tweaks to the constant.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking_test.go` around lines 155 - 162, The test
TestMaxChunkLimitSafety currently asserts exact equality on maxChunkLimit and
then redundantly checks bounds with require.Greater/require.Less; either remove
the redundant Greater/Less calls or change the equality assertion to only assert
the acceptable range. Edit the TestMaxChunkLimitSafety function to keep a single
style: if you want a fixed sentinel value keep require.Equal(t, int64(1000000),
int64(maxChunkLimit), ...) and delete the subsequent
require.Greater/require.Less lines, or instead replace the require.Equal line
with two range checks using require.GreaterOrEqual/require.LessOrEqual (or
require.Greater/require.Less) against the chosen lower/upper bounds to allow
future tweaks to maxChunkLimit.

dumpling/export/string_chunking.go (1)

58-111: Reuse beginChunkTracking instead of open-coding the chunk-tracking handshake.

dumpling/export/dump.go:1013-1029 exposes beginChunkTracking(meta) which does exactly what Lines 59-60 + Lines 95-110 do here (register chunkStats in d.chunkedTables, return a finalizer that sets finalized=true, atomic LoadAndDelete to claim finishedTablesCounter once sent == finished). Reusing it keeps the handshake contract in one place — otherwise any future fix to the producer-side termination protocol has to be mirrored in two call sites.

♻️ Proposed refactor

 	// Initialize chunk tracking
-	chunkStats := newTableChunkStat()
-	d.chunkedTables.Store(meta.ChunkKey(), chunkStats)
+	finishChunkTracking := d.beginChunkTracking(meta)
@@
 	defer func() {
 		// Send any remaining buffered chunk as the last chunk
 		if buffer != nil {
 			_ = sendBufferedChunk(true) // Mark as last chunk
 		}
 
-		chunkStats.finalized.Store(true)
-
 		tctx.L().Debug("streaming chunking complete",
 			zap.String("database", db),
 			zap.String("table", tbl),
 			zap.Int64("totalChunks", totalChunks))
 
-		if chunkStats.finished.Load() == chunkStats.sent.Load() {
-			// Atomic claim via LoadAndDelete: if the last writer callback
-			// already handled termination, Delete returns loaded==false
-			// and we skip the increment to avoid double-counting the
-			// finishedTablesCounter.
-			if _, loaded := d.chunkedTables.LoadAndDelete(meta.ChunkKey()); loaded {
-				IncCounter(d.metrics.finishedTablesCounter)
-			}
-		}
+		finishChunkTracking()
 	}()

As per coding guidelines: "Code SHOULD remain maintainable for future readers with basic TiDB familiarity" and follow existing package-local conventions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking.go` around lines 58 - 111, Replace the ad-hoc
chunk-tracking handshake (creating chunkStats, storing into d.chunkedTables, and
the deferred finalizer that sets finalized and does LoadAndDelete/IncCounter)
with the existing beginChunkTracking(meta) helper: call beginChunkTracking(meta)
to obtain the chunkStats and the returned finalizer function, keep the
bufferedChunk/sendBufferedChunk logic and totalChunks variable as-is, and in the
defer first flush any remaining buffered chunk then invoke the returned
finalizer instead of the current block that sets chunkStats.finalized and does
LoadAndDelete/IncCounter; remove the duplicated manual d.chunkedTables.Store and
finalization code so the handshake is centralized in beginChunkTracking.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dumpling/export/string_chunking.go`:
- Around line 209-224: The loop that samples boundaries should not swallow
errors: in the block where conn.QuerySQL returns err (currently logged via
tctx.L().Warn and then break), change the logic to return that error up to the
caller (preserving context in the log if desired) so the caller/job-level
retry/backoff can handle it; specifically modify the error branch handling
around conn.QuerySQL/currentBoundary/previousBoundary in the boundary-collection
routine (the code that logs "failed to sample boundary, stopping boundary
collection") to return the error instead of breaking, ensuring
totalChunks/previousBoundary are not used to produce degenerate full-table or
tail chunks.
- Around line 174-199: The fallback in the type switch that sets
currentBoundary[j] = fmt.Sprintf("%v", v) silently accepts unmapped types;
replace this silent fallback with explicit error handling: in the switch's
default case return a descriptive error (including the column index/name from
orderByColumns, the encountered Go type and value) indicating an unsupported
column type for composite-key chunking, and propagate that error by updating the
surrounding function signature and callers to return/handle an error;
alternatively, if you prefer documentation instead, add a clear comment above
the currentBoundary/type-switch block listing the supported driver-returned
types (string, []byte, int64/int32/int, float64/float32) and state that any
other types will be treated as unsupported.

---

Nitpick comments:
In `@dumpling/export/string_chunking_test.go`:
- Around line 155-162: The test TestMaxChunkLimitSafety currently asserts exact
equality on maxChunkLimit and then redundantly checks bounds with
require.Greater/require.Less; either remove the redundant Greater/Less calls or
change the equality assertion to only assert the acceptable range. Edit the
TestMaxChunkLimitSafety function to keep a single style: if you want a fixed
sentinel value keep require.Equal(t, int64(1000000), int64(maxChunkLimit), ...)
and delete the subsequent require.Greater/require.Less lines, or instead replace
the require.Equal line with two range checks using
require.GreaterOrEqual/require.LessOrEqual (or require.Greater/require.Less)
against the chosen lower/upper bounds to allow future tweaks to maxChunkLimit.

In `@dumpling/export/string_chunking.go`:
- Around line 58-111: Replace the ad-hoc chunk-tracking handshake (creating
chunkStats, storing into d.chunkedTables, and the deferred finalizer that sets
finalized and does LoadAndDelete/IncCounter) with the existing
beginChunkTracking(meta) helper: call beginChunkTracking(meta) to obtain the
chunkStats and the returned finalizer function, keep the
bufferedChunk/sendBufferedChunk logic and totalChunks variable as-is, and in the
defer first flush any remaining buffered chunk then invoke the returned
finalizer instead of the current block that sets chunkStats.finalized and does
LoadAndDelete/IncCounter; remove the duplicated manual d.chunkedTables.Store and
finalization code so the handshake is centralized in beginChunkTracking.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a23a23a8-56c1-4b45-9b90-cc733d9abe96

📥 Commits

Reviewing files that changed from the base of the PR and between 5068102 and 3cf73d2.

📒 Files selected for processing (3)

dumpling/export/string_chunking.go
dumpling/export/string_chunking_test.go
dumpling/export/task.go

…column types - Boundary-sampling failure used to log a warning and `break`, which silently dropped into the post-loop branches. On a first-iteration failure the table was dumped as a single un-chunked task; on a later failure the dump emitted an oversized "tail" chunk from the last good boundary to +∞. Either way the caller never saw the failure. Return the error so the job-level retry/backoff applies. - Replace the silent `fmt.Sprintf("%v", v)` fallback in the row-scan type switch with an explicit error referencing the column name and Go type. The MySQL driver returns string/[]byte/int*/float* for the supported boundary types (parseTime is not enabled in GetDriverConfig), so any other type means a future caller widened the chunking surface beyond what the WHERE-clause builders support and we should fail loudly rather than emit a malformed predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

dumpling/export/string_chunking.go (1)

244-259: Optional: deduplicate the per-chunk task construction.

The two branches differ only in the WHERE-clause builder; the buildWhereCondition / buildSelectQuery / newTaskTableData calls are identical. Extracting the where-clause selection makes intent slightly clearer and removes the copy-paste.

♻️ Suggested refactor

-		// Create task for chunk using previousBoundary -> currentBoundary (with buffering)
-		var newTask *TaskTableData
-
-		if len(previousBoundary) == 0 {
-			// First chunk: everything up to first boundary
-			whereClause := buildUpperBoundWhereClause(orderByColumns, currentBoundary)
-			fullWhere := buildWhereCondition(conf, whereClause)
-			query := buildSelectQuery(db, tbl, selectField, "", fullWhere, orderByClause)
-			newTask = d.newTaskTableData(meta, newTableData(query, selectLen, false), int(totalChunks), -1)
-		} else {
-			// Intermediate chunk: between previousBoundary and currentBoundary
-			whereClause := buildBoundedWhereClause(orderByColumns, previousBoundary, currentBoundary)
-			fullWhere := buildWhereCondition(conf, whereClause)
-			query := buildSelectQuery(db, tbl, selectField, "", fullWhere, orderByClause)
-			newTask = d.newTaskTableData(meta, newTableData(query, selectLen, false), int(totalChunks), -1)
-		}
+		// Create task for chunk using previousBoundary -> currentBoundary (with buffering).
+		// First chunk: everything strictly less than the first boundary.
+		// Intermediate chunk: bounded range [previousBoundary, currentBoundary).
+		var whereClause string
+		if len(previousBoundary) == 0 {
+			whereClause = buildUpperBoundWhereClause(orderByColumns, currentBoundary)
+		} else {
+			whereClause = buildBoundedWhereClause(orderByColumns, previousBoundary, currentBoundary)
+		}
+		fullWhere := buildWhereCondition(conf, whereClause)
+		query := buildSelectQuery(db, tbl, selectField, "", fullWhere, orderByClause)
+		newTask := d.newTaskTableData(meta, newTableData(query, selectLen, false), int(totalChunks), -1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/string_chunking.go` around lines 244 - 259, The per-chunk
task construction duplicates identical calls; refactor by computing the WHERE
clause into a single variable using buildUpperBoundWhereClause(orderByColumns,
currentBoundary) when previousBoundary is empty else
buildBoundedWhereClause(orderByColumns, previousBoundary, currentBoundary), then
call buildWhereCondition(conf, whereClause), buildSelectQuery(db, tbl,
selectField, "", fullWhere, orderByClause) and create the task with
d.newTaskTableData(meta, newTableData(query, selectLen, false),
int(totalChunks), -1) only once to remove the duplicate blocks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dumpling/export/string_chunking.go`:
- Around line 19-21: The code can divide by zero when d.conf.Rows is 0; add a
defensive check before computing chunk sizes in concurrentDumpStringFields (and
the integer-column chunking function, e.g., concurrentDumpIntFields) to ensure
rowsPerChunk := int64(d.conf.Rows) > 0; if it is 0, set rowsPerChunk = 1 (or
return a clear error) and then use rowsPerChunk for numChunks calculation so the
division cannot panic; update uses of chunkSize/rowsPerChunk consistently in
that function and the corresponding integer chunking function.

---

Nitpick comments:
In `@dumpling/export/string_chunking.go`:
- Around line 244-259: The per-chunk task construction duplicates identical
calls; refactor by computing the WHERE clause into a single variable using
buildUpperBoundWhereClause(orderByColumns, currentBoundary) when
previousBoundary is empty else buildBoundedWhereClause(orderByColumns,
previousBoundary, currentBoundary), then call buildWhereCondition(conf,
whereClause), buildSelectQuery(db, tbl, selectField, "", fullWhere,
orderByClause) and create the task with d.newTaskTableData(meta,
newTableData(query, selectLen, false), int(totalChunks), -1) only once to remove
the duplicate blocks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7ee62c21-aa60-4675-889d-93eeb8b7dc79

📥 Commits

Reviewing files that changed from the base of the PR and between 3cf73d2 and 8e5eb75.

📒 Files selected for processing (1)

dumpling/export/string_chunking.go

coderabbitai · 2026-04-25T06:30:23Z

+func (d *Dumper) concurrentDumpStringFields(tctx *tcontext.Context, conn *BaseConn, meta TableMeta, taskChan chan<- Task, fields []string, orderByClause string, estimatedCount uint64) error {
+	chunkSize := int64(d.conf.Rows)
+	numChunks := (int64(estimatedCount) + chunkSize - 1) / chunkSize


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Locate the call site and inspect the surrounding guard. rg -nP --type=go -C8 'concurrentDumpStringFields\s*\(' dumpling/export/ # Also check whether conf.Rows == 0 is filtered earlier in the chunking decision tree. rg -nP --type=go -C3 '\bconf\.Rows\b|\bRows\s*==\s*0|\bRows\s*<=\s*0' dumpling/export/dump.go

Repository: pingcap/tidb

Length of output: 4602

🏁 Script executed:

#!/bin/bash # Find the definition of UnspecifiedSize rg -nP --type=go 'UnspecifiedSize\s*(?:=|:)' dumpling/ # Find GetSuitableRows definition rg -nP --type=go -A5 'func\s+GetSuitableRows' dumpling/ # Check if conf.Rows is validated anywhere rg -nP --type=go 'conf\.Rows.*<=\s*0|conf\.Rows.*==\s*0' dumpling/

Repository: pingcap/tidb

Length of output: 445

🏁 Script executed:

#!/bin/bash # Get the full GetSuitableRows function sed -n '584,620p' dumpling/export/sql.go

Repository: pingcap/tidb

Length of output: 1154

🏁 Script executed:

#!/bin/bash # Get context around the goroutine path to confirm the flow sed -n '709,725p' dumpling/export/dump.go

Repository: pingcap/tidb

Length of output: 593

🏁 Script executed:

#!/bin/bash # Get the full context of concurrentDumpTable around the string field check sed -n '860,930p' dumpling/export/dump.go

Repository: pingcap/tidb

Length of output: 2951

Add a defensive check to prevent division by zero if d.conf.Rows is 0.

If a very large average row length causes GetSuitableRows() to return 0 (when bytesPerFile / avgRowLength rounds down to 0), then conf.Rows becomes 0. The guard at line 898 (if count < conf.Rows) fails to catch this case since count is a uint64 that cannot be less than 0. This allows concurrentDumpStringFields to be called with chunkSize = 0, causing division by zero at line 21. The same risk applies to line 923 in the integer-column chunking path. Either validate conf.Rows > 0 before chunking operations, or ensure GetSuitableRows() guarantees a non-zero return.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dumpling/export/string_chunking.go` around lines 19 - 21, The code can divide by zero when d.conf.Rows is 0; add a defensive check before computing chunk sizes in concurrentDumpStringFields (and the integer-column chunking function, e.g., concurrentDumpIntFields) to ensure rowsPerChunk := int64(d.conf.Rows) > 0; if it is 0, set rowsPerChunk = 1 (or return a clear error) and then use rowsPerChunk for numChunks calculation so the division cannot panic; update uses of chunkSize/rowsPerChunk consistently in that function and the corresponding integer chunking function.

ti-chi-bot · 2026-04-25T06:44:54Z

@takaidohigasi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
non-block/pull-unit-test-next-gen	`b3a7f18`	link	false	`/test pull-unit-test-next-gen`
non-block/pull-integration-e2e-test-next-gen	`b3a7f18`	link	false	`/test pull-integration-e2e-test-next-gen`
non-block/pull-mysql-test-next-gen	`b3a7f18`	link	false	`/test pull-mysql-test-next-gen`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

takaidohigasi · 2026-04-25T07:47:37Z

/test pull-integration-realcluster-test-next-gen

ti-chi-bot · 2026-04-25T07:47:54Z

@takaidohigasi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test build

/test check-dev

/test check-dev2

/test mysql-test

/test pull-br-integration-test

/test pull-build-next-gen

/test pull-integration-e2e-test

/test pull-integration-realcluster-test-next-gen

/test pull-lightning-integration-test

/test pull-mysql-client-test

/test pull-mysql-client-test-next-gen

/test pull-unit-test-ddlv1

/test pull-unit-test-next-gen

/test unit-test

The following commands are available to trigger optional jobs:

/test pingcap/tidb/canary_ghpr_unit_test

/test pull-br-integration-test-next-gen

/test pull-check-deps

/test pull-common-test

/test pull-e2e-test

/test pull-error-log-review

/test pull-integration-common-test

/test pull-integration-copr-test

/test pull-integration-ddl-test

/test pull-integration-ddl-test-next-gen

/test pull-integration-e2e-test-next-gen

/test pull-integration-jdbc-test

/test pull-integration-mysql-test

/test pull-integration-nodejs-test

/test pull-integration-python-orm-test

/test pull-mysql-test-next-gen

/test pull-sqllogic-test

/test pull-tiflash-integration-test

Use /test all to run the following jobs that were automatically triggered:

pingcap/tidb/ghpr_build

pingcap/tidb/ghpr_check

pingcap/tidb/ghpr_check2

pingcap/tidb/ghpr_mysql_test

pingcap/tidb/ghpr_unit_test

pingcap/tidb/pull_build_next_gen

pingcap/tidb/pull_integration_e2e_test

pingcap/tidb/pull_integration_realcluster_test_next_gen

pingcap/tidb/pull_mysql_client_test

pingcap/tidb/pull_mysql_client_test_next_gen

pingcap/tidb/pull_unit_test_next_gen

pull-check-deps

pull-error-log-review

Details

In response to this:

/test pull-integration-realcluster-test-next-gen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tiprow · 2026-04-25T07:48:01Z

@takaidohigasi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test fast_test_tiprow

/test tidb_parser_test

Use /test all to run all jobs.

Details

In response to this:

/test pull-integration-realcluster-test-next-gen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

takaidohigasi · 2026-04-25T07:48:55Z

/test pull-integration-realcluster-test-next-gen

tiprow · 2026-04-25T07:49:19Z

@takaidohigasi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test fast_test_tiprow

/test tidb_parser_test

Use /test all to run all jobs.

Details

In response to this:

/test pull-integration-realcluster-test-next-gen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

takaidohigasi · 2026-04-29T02:54:05Z

thanks. ready again.

takaidohigasi · 2026-05-13T21:35:52Z

/retest required

ti-chi-bot · 2026-05-13T21:36:10Z

@takaidohigasi: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test build

/test check-dev

/test check-dev2

/test mysql-test

/test pull-br-integration-test

/test pull-build-next-gen

/test pull-integration-e2e-test

/test pull-integration-realcluster-test-next-gen

/test pull-lightning-integration-test

/test pull-mysql-client-test

/test pull-mysql-client-test-next-gen

/test pull-unit-test-ddlv1

/test pull-unit-test-next-gen

/test unit-test

The following commands are available to trigger optional jobs:

/test pingcap/tidb/canary_ghpr_unit_test

/test pull-br-integration-test-next-gen

/test pull-check-deps

/test pull-common-test

/test pull-e2e-test

/test pull-error-log-review

/test pull-integration-common-test

/test pull-integration-copr-test

/test pull-integration-ddl-test

/test pull-integration-ddl-test-next-gen

/test pull-integration-e2e-test-next-gen

/test pull-integration-jdbc-test

/test pull-integration-mysql-test

/test pull-integration-nodejs-test

/test pull-integration-python-orm-test

/test pull-mysql-test-next-gen

/test pull-sqllogic-test

/test pull-tiflash-integration-test

Use /test all to run the following jobs that were automatically triggered:

pingcap/tidb/ghpr_build

pingcap/tidb/ghpr_check

pingcap/tidb/ghpr_check2

pingcap/tidb/ghpr_mysql_test

pingcap/tidb/ghpr_unit_test

pingcap/tidb/pull_build_next_gen

pingcap/tidb/pull_integration_e2e_test

pingcap/tidb/pull_integration_realcluster_test_next_gen

pingcap/tidb/pull_mysql_client_test

pingcap/tidb/pull_mysql_client_test_next_gen

pingcap/tidb/pull_unit_test_next_gen

pull-check-deps

pull-error-log-review

Details

In response to this:

/retest required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tiprow · 2026-05-13T21:36:21Z

@takaidohigasi: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test fast_test_tiprow

/test tidb_parser_test

Use /test all to run all jobs.

Details

In response to this:

/retest required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

takaidohigasi · 2026-05-19T05:10:48Z

it was ready, but conflicted now, so will resolve it.

…y-handling # Conflicts: # dumpling/export/ir.go # dumpling/export/util_for_test.go

ti-chi-bot · 2026-05-19T05:25:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gmhdbjd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

dumpling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pantheon-ai · 2026-05-19T05:25:26Z

@takaidohigasi I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jul 3, 2025

takaidohigasi changed the title ~~[pingcap/tidb#61999] improve dumpling string key handling~~ dumpling: improve dumpling string key handling [pingcap/tidb#61999] Jul 3, 2025

ti-chi-bot Bot removed do-not-merge/invalid-title do-not-merge/needs-tests-checked labels Jul 3, 2025

This comment was marked as resolved.

Sign in to view

ti-chi-bot Bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jul 3, 2025

ti-chi-bot Bot removed the do-not-merge/needs-linked-issue label Jul 3, 2025

takaidohigasi force-pushed the improve-string-key-handling branch from 37679fe to b541789 Compare July 3, 2025 05:33

lance6716 changed the title ~~dumpling: improve dumpling string key handling [pingcap/tidb#61999]~~ dumpling: improve dumpling string key handling Jul 3, 2025

takaidohigasi force-pushed the improve-string-key-handling branch 2 times, most recently from d0d4277 to 9436eee Compare July 3, 2025 06:15

ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 3, 2025

takaidohigasi mentioned this pull request Jul 3, 2025

dumpling: Parallel processing implementation for string-type #61999

Open

takaidohigasi force-pushed the improve-string-key-handling branch from 667c0d0 to 0e948e0 Compare July 3, 2025 08:07

This comment was marked as resolved.

Sign in to view

takaidohigasi and others added 3 commits April 24, 2026 19:58

takaidohigasi commented Apr 24, 2026

View reviewed changes

Comment thread dumpling/export/dump.go

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

takaidohigasi and others added 3 commits April 25, 2026 08:30

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread dumpling/export/string_chunking_test.go

Comment thread dumpling/export/string_chunking.go Outdated

Comment thread dumpling/export/string_chunking.go

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread dumpling/export/string_chunking.go

Comment thread dumpling/export/string_chunking.go

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/master' into improve-string-ke…

0610dc4

…y-handling # Conflicts: # dumpling/export/ir.go # dumpling/export/util_for_test.go

Conversation

takaidohigasi commented Jul 3, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Latest state (2026-04-24)

What problem does this PR solve?

What changed and how does it work?

simplified diagram

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented Jul 3, 2025

Uh oh!

ti-chi-bot Bot commented Jul 3, 2025

Uh oh!

tiprow Bot commented Jul 3, 2025

Uh oh!

This comment was marked as resolved.

lance6716 commented Jul 3, 2025

Uh oh!

codecov Bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

takaidohigasi commented Apr 24, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

takaidohigasi commented Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

takaidohigasi commented Apr 25, 2026

Uh oh!

ti-chi-bot Bot commented Apr 25, 2026

Uh oh!

tiprow Bot commented Apr 25, 2026

Uh oh!

takaidohigasi commented Apr 25, 2026

Uh oh!

tiprow Bot commented Apr 25, 2026

Uh oh!

takaidohigasi commented Apr 29, 2026

Uh oh!

takaidohigasi commented May 13, 2026

Uh oh!

ti-chi-bot Bot commented May 13, 2026

Uh oh!

tiprow Bot commented May 13, 2026

takaidohigasi commented Jul 3, 2025 •

edited by coderabbitai Bot

Loading

codecov Bot commented Jul 3, 2025 •

edited

Loading

ti-chi-bot Bot commented Apr 25, 2026 •

edited

Loading

pantheon-ai Bot commented May 19, 2026 •

edited

Loading